-------------------
GENERAL INFORMATION
-------------------

This dataset contains the data files used in the study "Feeding Health Inequality through Platform-Based Food Delivery in China," coauthored by Hai Ding, Chenran Liu, Yu Xie, Jia Yu and Zhengrong Yuan. The study examines how the expansion of online food delivery platforms affects health inequality in China by combining county-level food delivery platform entry data with individual-level survey data from the China Family Panel Studies (CFPS).

The released files provide information on the timing and expansion of food delivery platform entry at the county level, county-level characteristics, pre-treatment county-level control variables, and replication code for the regression results.

Please note that reproducing the full set of empirical results requires access to the CFPS data. Individual-level CFPS public-use data can be obtained after registration through the CFPS data platform. Restricted CFPS data containing county-level geographic identifiers must be accessed through designated data laboratories approved by the Institute of Social Science Survey (ISSS), Peking University.

--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 

The data files provided in this release are intended for academic research and replication purposes. Users should cite the corresponding article when using these files. [Citation information will be updated after publication.]

The China Family Panel Studies (CFPS) data used in the study are subject to the data-use requirements of the CFPS project and are not redistributed in this release. Individual-level CFPS public-use data can be accessed after personal registration through the CFPS data platform: https://cfpsdata.pku.edu.cn/.

Restricted CFPS data containing county-level geographic identifiers are not publicly downloadable. These data must be accessed through designated data laboratories approved by the ISSS at Peking University. For details, please see: https://cfpsdata.pku.edu.cn/#/newsDetail/51.

Users who wish to reproduce the full empirical analyses should obtain the required CFPS public-use and restricted-use data directly from the CFPS project and comply with all relevant data-use agreements and confidentiality requirements.

--------------------
DATA & FILE OVERVIEW
--------------------

Local files:

- README.txt
- food_delivery_entry_count.xlsx
- county_characteristics.dta
- county_characteristics_2012.dta
- food_delivery_pilot.dta
- regression code.do

External files:

- China Family Panel Studies (CFPS) public-use data
- China Family Panel Studies (CFPS) restricted-use data with county-level geographic identifiers

Relationship between files, if important for context:

[food_delivery_entry_count.xlsx] provides monthly and cumulative information on the rollout of food delivery platforms across counties and is used to reproduce Fig. 1. It was constructed by aggregating the platform entry information in [food_delivery_pilot.dta].

[county_characteristics.dta] provides annual county-level socioeconomic characteristics from 2010 to 2022, along with geographic characteristics, and is used to estimate the exponential hazard model for platform entry timing reported in Table S3.
Sources: The county-level socioeconomic variables (lnrgdp, lnpop_density, and lnsales_consumer) were obtained from the China County Statistical Yearbook, and the terrain slope variable (slope) was derived from the Digital Elevation Model in the 1:1 Million Scale National Basic Geographic Database.

[county_characteristics_2012.dta] provides pre-treatment county-level characteristics measured in 2012.
Sources: The county-level socioeconomic variables (lnrgdp_2012, lnpop_density_2012, and lnsales_consumer_2012) were obtained from the China County Statistical Yearbook, and the terrain slope variable (slope) was derived from the Digital Elevation Model in the 1:1 Million Scale National Basic Geographic Database. The county-level Gini coefficient of household income (gini_manual2005) was constructed using the 1% random sample of the 2005 Population Census, while the log numbers of gyms, restaurants, and grocery stores in 2012 (lngyms_2012, lnrestaurants_2012, and lnstores_2012) were constructed from AMap point-of-interest data.

[food_delivery_pilot.dta] provides the timing of food delivery platform entry across districts/counties. Together with [county_characteristics_2012.dta], this file is used to construct the treatment variables and county-level controls for the main regression analyses, mechanism analyses, heterogeneity analyses, and robustness checks.
Sources: The food delivery platform entry data were constructed by the authors using archived webpages from the Wayback Machine and job-posting information from major recruitment websites, including Zhaopin, 51Job, and Liepin.

[regression code.do] provides the Stata code used to reproduce the regression results.

To reproduce the full set of empirical results, users need to merge the provided county-level platform entry and county-characteristics files with CFPS individual-level data. The CFPS data are provided by the ISSS, Peking University. Because the CFPS data are subject to data-use restrictions, the individual-level CFPS data and restricted county-level geographic identifiers are not included in this release.